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Abstract. This study deals with the missing link prediction problem: 
the problem of predicting the existence of missing connections between 
entities of interest. We address link prediction using coupled analysis of 
relational datasets represented as heterogeneous data, i.e., datasets in the 
form of matrices and higher-order tensors. We propose to use an approach 
based on probabilistic interpretation of tensor factorisation models, i.e., 
Generalised Coupled Tensor Factorisation, which can simultaneously fit 
a large class of tensor models to higher-order tensors/matrices with com- 
mon latent factors using different loss functions. Numerical experiments 
demonstrate that joint analysis of data from multiple sources via coupled 
factorisation improves the link prediction performance and the selection 
of right loss function and tensor model is crucial for accurately predicting 
missing links. 
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1 Introduction 

Recent technological advances, such as the Internet, multi-media devices or so- 
cial networks provide abundance of relational data. For instance, in retail recom- 
mcnder systems, in addition to retail data showing who has bought which items, 
we may also have access to customers' social networks, i.e., who is friends with 
whom. In such complex problems, jointly analyzing data from multiple sources 
has great potential to increase our ability for capturing the underlying structure 
in data. Data fusion, therefore, is a viable candidate for addressing the challeng- 
ing link prediction problem. Applications in many areas including recommender 
systems and social network analysis deal with link prediction, i.e., the problem of 
inferring whether there is a relation between the entities of interest. For instance, 
if a customer buys an item, the customer and the item can be considered to be 
linked. The task of recommending other items the customer may be interested in 
can be cast as a missing link prediction problem. However, the results are likely 
to be poor if the prediction is done in isolation on a single view of data. Such 
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Fig. 1: A third-order tensor coupled with two matrices in two different modes. 

datasets, whilst large in dimension, are already very sparse [1] and potentially 
represent only a very incomplete picture of the reality [2] . Therefore, relational 
data from other sources is often incorporated into link prediction models [3] . 

Matrix factorisations have proved to be very useful in recommender systems 
[4] . An effective way of including side information via additional relational data 
in a link prediction model is to represent different relations as a collection of 
matrices. Subsequently, this collection of matrices are jointly analyzed using 
collective matrix factorisation [5,6]. In many applications, however, matrices are 
not sufficient for a faithful representation of multiple attributes, and higher- 
order tensor and matrix factorisation methods are needed. An influential study 
in this direction is by Banerjee et al. [7], where a general clustering method for 
joint analysis of heterogeneous data has been studied. The goal here is clustering 
entities based on multiple relations, where each relation is represented as a matrix 
(e.g., movies by review words matrix showing movie reviews) or a higher-order 
tensor (e.g., movies by viewers by actors tensor showing viewers' ratings). 

In this paper, we address link prediction problem using coupled analysis 
of datasets in the form of matrices and higher-order tensors. As an example 
application, we study a real-world GPS (Global Positioning System) dataset 
[8] for location-activity recommendation such that given an incomplete dataset 
showing which users perform which activities at various locations, we would like 
to fill in the missing links between (user, activity, location) triplets (X\). We 
also make use of additional sources of information showing the locations visited 
by users based on GPS trajectories (X 2 ) and the features of locations in terms 
of number of different points of interest at each location (X3) (Figure 1). 

Various algorithms have been proposed in the literature for coupled analysis 
of heterogeneous data. Lin et al. [9] addresses the community extraction problem 
on multi-relational data using a coupled factorisation approach modeling higher- 
order tensors using a specific tensor model, i.e., C ANDECOMP /PARAFAC (CP) 
[10,11], and a Kullback-Leibler (KL) divergence-based cost function. Also, a re- 
cent study by Narita et al. [12] has addressed the tensor completion problem 
using additional data using a Euclidean distance-based loss function. Unlike 
previous studies, we use an approach, i.e., Generalized Coupled Tensor Factori- 
sations (GCTF) [13], which enables us to investigate alternative tensor models 
and cost functions for coupled analysis of heterogeneous data. The main contri- 
butions of this paper can be summarized as follows: 

— We consider different tensor models, i.e., CP and Tucker [14], and loss func- 
tions, i.e., KL-divergence and Euclidean distance, for joint analysis of het- 
erogeneous data for link prediction using the GCTF framework. 
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— Using a real GPS data set, we demonstrate that coupled tensor factorisations 
outperform low-rank approximations of a single tensor in terms of missing 
link prediction and the selection of the tensor model as well as the loss 
function is significant in terms of link prediction performance. 

— We also demonstrate that it is possible to address the cold-start problem in 
link prediction using the proposed coupled models. 

The rest of the paper is organized as follows. In §2, we survey the related 
work on link prediction as well as joint factorisation of data. §3 introduces our 
algorithmic framework, i.e., GCTF, while §4 discusses its adaptation for the link 
prediction problem. Experimental results on a real dataset are presented in §5. 
Finally, we conclude with future work in §6. 

2 Related Work 

In order to deal with the challenging task of link prediction, many studies have 
proposed to exploit multi-relational nature of the data and showed improved 
link prediction performance by incorporating related sources of information in 
their modeling framework. For instance, Taskar et al [15] use relational Markov 
networks that model links between entities as well as their attributes. Popescul 
and Ungan [16] extract relational features to learn the existence of links (see [3] 
for a comprehensive list of similar studies). 

For analysis of multi-relational data, Singh and Gordon [6] as well as Long et 
al. [5] introduce collective matrix factorisation. Matrix factorisation-based tech- 
niques have proved useful in terms of capturing the underlying patterns in data, 
e.g., in recommender systems [4], and joint analysis of matrices has been widely 
applied in numerous disciplines including signal processing [17] and bioinformat- 
ics [18] . Recent studies extend collective matrix factorisation to coupled analysis 
of multi-relational data in the form of matrices and higher-order tensors [19,7] 
since in many disciplines, relations can be defined among more than two entities, 
e.g., when a user engages in an activity at a certain location, a relation can be 
defined over user, activity and location entities. Banerjee et al. [7] introduced 
a multi-way clustering approach for relational and multi-relational data where 
coupled analysis of heterogeneous data was studied using minimum Bregman in- 
formation. Lin et al. [9] also discussed coupled matrix and tensor factorisations 
using KL-divergence modeling higher-order tensors by fitting a CP model. While 
these studies use alternating algorithms, Acar et al. [20] proposed an all-at-once 
optimization approach for coupled analysis. 

Missing link prediction is also closely related to matrix and tensor completion 
studies. By using a low-rank structure of a data set, it is possible to recover 
missing entries for matrices [21] and higher-order tensors [22]. 

Note that we focus on missing link prediction in this paper and do not address 
the temporal link prediction problem, where snapshots of the set of links up 
to time t are given and the goal is to predict the links at time t + 1. Tensor 
factorisations have previously been used for temporal link prediction [23] but 
using only a single source of information. 
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3 Methodology 

In this study, we discuss Generalized Coupled Tensor Factorisation framework[13] 
for coupled factorisation of several tensors and matrices to fill in the missing 
links in observed data. A generalized tensor factorisation problem is specified by 
an observed tensor X (with possibly missing entries) and a collection of latent 
tensors to be estimated, Zi : \ a \ = {Z a } for a = l...|a|. 

GCTF framework is a generalisation of the Probabilistic Latent Tensor Fac- 
torisation (PLTF) [24] to coupled factorisation. In this framework, the goal is to 
compute an approximate factorisation of X in terms of a product of individual 
factors Z a . Here, we define V as the set of all indices in a model, Vq as the set 
of visible indices, V a as the set of indices in Z a , and V a = V — V a as the set of 
all indices not in Z a . We use small letters as v a to refer to a particular setting 
of indices in V a . 

PLTF tries to solve the following approximation problem 

X(vo)*X(vo) = ^l[Z a (v a ), (1) 

vq a 

Since the product Yl a Z a (v a ) is collapsed over a set of indices, the factori- 
sation is latent. The approximation problem is cast as an optimisation problem 
minimizing the divergence d{X,X), where d is a divergence (a quasi-squared- 
distance) between the observed data X and model prediction X. In applications, 
d is typically Euclidean (EUC), Kullback-Leibler (KL) or Itakura-Saito (IS) [13]. 

In this paper, we use non-negative variants of the two most widely-used low- 
rank tensor factorisation models, i.e., Tucker model, and the more restricted CP 
model, as baseline methods in §5. These models can be defined in the PLTF 
notation as follows. Given a three-way tensor X, its CP model is defined as: 

X(i,j, k) » X(i,j, k) = J2 Zi(i,r)Z 2 (j, r)Z 3 (k,r) (2) 

r 

where index sets V — {i,j,k,r}, Vb = {i,j,k}, V\ — {i,r}, Vi — {j,r} and 
V3 = {fc,r}. A Tucker model of X is defined in the PLTF notation as follows: 

X(i,j,k)^X(i,j,k) = Yl Z 1 (i,p)Z 2 {j,q)Z 3 (k,r)Z 4 (p,q,r) (3) 

where index sets V = {i,j,k,p,q,r} 7 V = {i,j,k} 7 Vi = {i,p}, V 2 = {j,q}, 
V 3 = {k, r} and V 4 = {p, q, r}. 

The update equation for non-negative generalized tensor factorisation can be 
used for both (2) and (3) and is expressed as: 

„ AJM0X-P0X) „ , N 

where o is the Hadamard product (element- wise product), M is a — 1 mask 
array with M(v ) — 1 (M(u ) = 0) if X(vo) is observed (missing). Here p 
determines the cost function, i.e., p = {0, 1, 2} correspond to the ^-divergence 
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[25] that unifies EUC, KL, and IS cost functions, respectively. In this iteration, 
we define the tensor valued function A a {A) as: 

A a {A) = Y j A{v ) H Z a ,{v a ,) (5) 

A a (A) is an object, the same size of Z a , obtained simply by multiplying all 
factors other than the one being updated with an object of the order of the 
data. Hence the key observation is that the A a function is just computing a 
tensor product and collapses this product over indices not appearing in Z a , 
which is algebraically equivalent to computing a marginal sum. 
As an example, for KL cost, we rewrite (4) more compactly as: 

Z a <- Z a oA a (Mo X/X)/A a {M) (6) 

This update rule can be used iteratively for all non-negative Z a and converges 
to a local minimum provided we start from some non-negative initial values. 
For updating Z a , we need to compute the A function twice for arguments A = 
M v o X-p o X v and A = M v o X^p. 



3.1 Generalized Coupled Tensor Factorisation 

The Generalised Coupled Tensor Factorisation model takes the PLTF model one 
step further where, we have multiple observed tensors X v that are factorised 
simultaneously: 

50,1- a 

where v = 1, ...\v\ and R is a coupling matrix defined as follows: 

_ / 1 if X v and Z a connected . . 

(_ otherwise 

Note that, distinct from PLTF model, there are multiple visible index sets (Vo, v ) 
in the GCTF model, each specifying the attributes of the observed tensor X v . 

The inference, i.e., estimation of the shared latent factors Z a , can be achieved 
via iterative optimisation (see [13]). For non-negative data and factors, one can 
obtain the following compact fixed point equation where each Z a is updated in 
an alternating fashion fixing the other factors Z a i , for a' ^ a: 

z a ^z a0 ^ Rv ' aA ^ M »°^:: x »\ ( 9) 

where M v is a — 1 mask array with M v (vq >v ) = 1 {M v {vq >v ) = 0) if X v {vq >v ) is 
observed (missing). Herep, as in (4), determines the cost function, i.e., p = {0, 1} 
corresponds to EUC and KL cost functions, respectively (see Table 1). In this 
iteration, the key quantity is the A a i> function defined as follows: 



,{A) 



(10) 
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Table 1: Update rules for different p values 
p Cost Function Multiplicative Update Rule 

Euclidean Z a <-Z a o ^ R „, aAa ^ M ^ 

1 Kullback-Leibler Z a «- Z a o S " 



Assuming that all datasets have equal number of dimensions, i..e, a tensor is 
an N x N x N array while the coupled matrix is of size N x N, then the leading 
term in the computational complexity of the coupled model will be due to the 
updates for the tensor model. For an F-component CP model, for instance, that 
would be 0(N 3 F). The updates can be implemented by taking into account the 
sparsity pattern of the data. 



4 Link Prediction with Coupled Tensor Factorisation 

In this section, by using the GCTF framework, we propose a solution for link 
prediction task with different coupled models and loss functions. We have a 
three-way observation tensor X\ with elements and 1, where denotes a known 
absent link and 1 denotes a known present link, and two auxiliary matrices Xi 
and X3 that provide side information. Our aim is to restore the missing links 
in Xi. This is a difficult link prediction problem since X x contains less than 
1% of all possible links or an entire slice of Xi may be missing. Using low- 
rank factorisation of a tensor to estimate missing entries will be ineffective, in 
particular, in the case of structured missing data such as missing slices. In terms 
of coupled models, we are not restricted to a specific model topology, i.e., since 
we use the GCTF framework, we can design application-specific models. The 
choice of a particular factorisation model is strongly guided by the application; 
therefore, we first give a brief description of the data set. 

UCLAF dataset 3 [8] is extracted from the GPS data that include information 
of three types of entities: user, location and activity. The relations between user- 
location-activity triplets are used to construct a three-way tensor X\ . In tensor 
X\ , an entry Xi (i, j, k) indicates the frequency of a user i visiting location j and 
doing activity k there; otherwise, it is 0. Since we address the link prediction 
problem in this study, we define the user-location-activity tensor X\ as: 

1 if user i visits location j and performs activity k there 



„ , . . , N f 1 if user i v 
Xi(*,J,*) = | otherwise 



The dataset has been constructed by clustering raw GPS points into 168 
meaningful locations and manually parsing the user comments attached to the 
GPS data into activity annotations for the 168 locations. Consequently, the data 
consists of 164 users, 168 locations and 5 different types of activities. (See [8] for 
details). 



3 http: //www. cse .ust .hk/~vincentz/aaailO .uclaf .data. mat 
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Additionally, the collected data includes side information: the location fea- 
tures from the POI (points of interest) database as well as the user-location 
preferences from the GPS trajectory data, represented by matrix X 2 and X 3 
respectively. In our model, the user-location preferences matrix of size I x J has 
entries X 2 (i, m), where / is the number of users and J is the number of locations 
and we use index m as the location index instead of j. The rationale behind this 
choice is to relax the model as the entries in X\ and X 2 are measuring distinct 
quantities: X 2 (i 1 m) represents the frequency of a user i visiting location m and 
stayed there over a time threshold while X\ only indicates an activity by a spe- 
cific user i at location j. The relation between the location entries j and m in 
X\ and X 2 are coupled via a common factor over the users. Finally, we represent 
the location-feature values with matrix X 3 of size J x N, where J is the num- 
ber of locations, that has the same location type in X\, and N is the number 
of features. In particular, an entry X 3 (j,n) represents the number of different 
POIs at location j. Using the location features, we could gain information about 
location similarities based on their feature values. 

In this data set, 18 users have no location and activity information. Therefore, 
we have used the remaining 146 users. In order to decrease the effect of outliers, 
location-feature matrix is preprocessed as follows: X 3 (j, n) = 1 + \og(X 3 (j,n)) 
if Xs(j, n) > 0; otherwise, X 3 (j, n) = 0. In our experiments, number of users is 
/ = 146, number of locations J = 168, number of activities K = 5 and number 
of location features N = 14. 

We form two coupled models to fill in the missing links in tensor Xi. For 
both models, we use KL divergence and Euclidean as the cost functions in our 
non-negative decomposition problems. In the first model, we applied the coupled 
approach to a CP-style tensor model by analysing the tensor Xi jointly with the 
additional matrices X 2 and X 3 . This gives us the following model: 

X 1 (i,j,k) = ^2A(i,r)B{j,r)C{k,r) (11) 

r 

X 2 {i,m) = ^A(i,r)D(m,r) (12) 

r 

*3(.?>) = ^B(j, r)i5(n,r) (13) 

r 

Here, we have three observed tensors, that share common factors; therefore, we 
have a coupled tensor factorisation problem. The coupling matrix R with |a| = 5, 
IH = 3 for this model is defined as follows: 
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Xi = Y.A l B 1 C 1 D°E a 
with X 2 = E A 1 B"C°D 1 E" . (14) 
X 3 = Y / A B 1 C°D°E 1 



Note that, X x and X 2 share the common factor matrix A with entries A(i, r); 
we can interpret each row of A(i, :) as user z's latent position in a |r| dimensional 
'preferences' space. The factor matrix B with entries B(j, r) represents the latent 
position of the location j in the same preferences space. The user i at location j 
tends to perform activity k where the weight A(i, r)B(j, r) is large for at least one 
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r, i.e., there is a match between the users preference and what the location 'has 
to offer'. The location specific factor B is also influenced by the location- feature 
matrix X 3 . 

Following the same line of thought, we apply the coupled approach using a 
Tucker decomposition to form our second model, which is as follows: 

Mhhk) = J2 A(i,p)B(j,q)C(k,r)D(p,q,r) (15) 

X 2 (i,m) = ^A(i,p)E(m,p) (16) 
v 

Xs(3,n) = ^2B(j,q)F(n,q) (17) 

r 

In this model, once again, the factor A is shared by Xi and X 2 , while the factor B 
is shared by X\ and X 3 . In contrast to the coupled CP model in (11), this model 
assumes that user i at location j tends to perform activity k with the weight 
J2 P q A(i,p)B(j, q)D(p, q, r). Here, a latent preference space interpretation is less 
intuitive but the model has more freedom to represent the dependence. 

5 Experimental Results 

In this section, we assess the performance of the coupled models proposed in the 
previous section in terms of missing link prediction. First, we demonstrate that 
coupled tensor factorisations outperform low-rank approximations of a single 
tensor in terms of missing link prediction. Then we compare different tensor 
models and loss functions and show that selection of the tensor model and loss 
function is significant in terms of link prediction performance, especially when 
the fraction of unobserved elements is high. Furthermore, we study the case 
with completely missing slices, which corresponds to the cold-start problem in 
our link prediction setting and demonstrate that it is still possible to predict 
missing links using the proposed coupled models. 

5.1 Experimental Setting 

We design experiments to evaluate the performance of our models in terms of 
link prediction. By setting different amounts of data to missing in user-location- 
activity tensor Xi, we compare the following models using both KL-divergence 
and the Euclidean as cost functions: 

— Low-rank approximations of a single tensor: (i) CP and (ii) Tucker factori- 
sation of user- location-activity tensor X x , 

— Coupled tensor factorisations: (i) CP factorisation of X\ coupled with fac- 
torisation of user-location matrix X 2 and location-feature matrix A 3 (ii) 
Tucker factorisation of X\ coupled with factorisation of X 2 and X 3 . 

We use two missing data patterns: (i) randomly missing entries, (ii) randomly 
missing slices. In all experiments, number of components, i.e., number of columns 
in each factor matrix, Zi, is set to 2. To measure the link prediction performance, 
we use AUC (Area Under the Receiver Operating Characteristic Curve). 
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5.2 Results 



In order to demonstrate the power of coupled analysis, we compared the link 
prediction performance of standard CP and Tucker models with coupled ones 
using EUC and KL cost functions at different amounts, i.e., {40,60,80,90,95}, 
of randomly unobserved elements. For all cases, coupled models outperform the 
standard models clearly. Figure 2 shows the comparison of CP and coupled CP 
models with different cost functions when 80% of the data is missing. As we 
can see, coupled models using additional information perform better than the 
standard models; in particular, when the percentage of missing data is high. 
When the fraction of missing data was more than 80%, the standard models 
could not find a solution. 
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(a) EUC with 80% missing 



(b) KL with 80% missing 



Fig. 2: Comparison of CP and Coupled(CP) models 



In order to demonstrate the effect of the cost function modeling the data, 
we have also carried out experiments on both coupled CP and Tucker models 
at different missing data fractions. For all cases, the KL cost function seems to 
perform better than EUC, especially when the fraction of missing entries is high. 
Figure 3 illustrates the performance of Euclidean distance and Kullback-Lcibler 
divergence for both coupled CP and Tucker models when 90% of the data is 
unobserved. 

Finally, Figure 4 shows the comparison of coupled CP and Tucker models 
in order to illustrate the tensor model modeling the data best. We can see that 
Tucker model outperforms the CP model; because Tucker model is more flexible 
due to the full core tensor which is helpful for us to explore the structural 
information embedded in the data. 



Cold Start Problem: We also study the missing slice problem, which is partic- 
ularly important in link prediction because we may often have new users starting 
to use an application, e.g., a location-activity recommender system. Since they 
are new users, they will have no entry in X\, i.e., a completely missing slice. It 
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Fig. 3: Comparison of EUC distance and KL divergence with 90% missing data 
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Fig. 4: Comparison of Coupled CP and Tucker models with KL 

is not possible to reconstruct a missing slice of a tensor using its low-rank ap- 
proximation. A similar argument is valid in the case of matrices for completely 
missing rows/columns [21]. In such cases, additional sources of information will 
be useful to make recommendations to new users. We observe that our coupled 
models could predict the links when there is no information about a user in ten- 
sor X\, by utilizing the additional sources of information. We test this case by 
setting slices to missing randomly in X\ . Figure 5 demonstrates the performance 
of coupled models with KL divergence when 10 users' data and 50 users' data 
are missing. Note that Tucker is superior to CP as the amount of missing data 
increases. 



6 Conclusions 



In this study, we have studied link prediction problem using coupled analysis of 
relational data represented as datasets in the form of matrices and higher-order 
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Fig. 5: Link prediction result with missing slices and KL cost 

tensors. The problem is formulated as simultaneous factorisation of higher-order 
tensors/matrices extracting common latent factors from the shared modes. While 
most existing studies on coupled analysis have been developed to fit a specific 
type of a tensor model using a particular loss function, we have used Generalized 
Coupled Tensor Factorisation framework, which enables us to develop coupled 
models for joint analysis of multiple data sets using various tensor models and 
cost functions. In our coupled analysis for link prediction, we have studied both 
KL-divergence and Euclidean distance-based cost functions as well as different 
tensor models. Our numerical results on a real GPS data demonstrate that se- 
lection of the tensor model and the loss function is important in terms of link 
prediction performance. While our experiments have been limited to a dataset, 
which is not large-scale, the updates used in GCTF respect the sparsity pattern 
in the data; therefore, the proposed approach scales to large data. We plan to 
extend our study in that direction and show its applicability on large-scale data. 
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